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Abstract 

Background: With a diversity of pigmented shell morphotypes governed by Mendelian patterns of inheritance, the 
common grove snail, Cepaea nemoralis, has served as a model for evolutionary biologists and population 
geneticists for decades. Surprisingly, the molecular mechanisms by which C. nemoralis generates this pigmented 
shelled diversity, and the degree of evolutionary conservation present between molluscan shell-forming proteomes, 
remain unknown. 

Results: Here, using next generation sequencing and high throughput proteomics, we identify and characterize the 
major proteinaceous components of the C. nemoralis shell, the first shell-proteome for a pulmonate mollusc. The 
recent availability of several marine molluscan shell-proteomes, and the dataset we report here, allow us to identify 
59 evolutionarily conserved and novel shell-forming proteins. While the C nemoralis dataset is dominated by proteins that 
share little to no similarity with proteins in public databases, almost half of it shares similarity with proteins present in other 
molluscan shells. In addition, we could not find any indication that a protein (or class of proteins) is directly associated with 
shell pigmentation in C nemoralis. This is in contrast to the only other partially characterized molluscan-shell pigmentation 
mechanism employed by the tropical abalone Haliotis asinina. 

Conclusions: The unique pulmonate shell-forming proteome that we report here reveals an abundance of both 
mollusc-specific and pulmonate-specific proteins, suggesting that novel coding sequences, and/or the extensive 
divergence of these sequences from ancestral sequences, supported the innovation of new shell types within the 
Conchifera. In addition, we report here the first evidence that molluscs use independently evolved mechanisms to 
pigment their shells. This proteome provides a solid foundation from which further studies aimed at the functional 
characterization of these shell-forming proteins can be conducted. 
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Background 

The evolutionary origins, mode of construction, pattern- 
ing, and physical properties of the molluscan shell have 
held the attention of scientists for centuries. However, 
the molecular mechanisms by which these structures are 
constructed are only beginning to be elucidated [1-3]. 
The molluscan shell is assembled extracellularly and is an 
ensemble of CaCOs and organic macromolecules (pro- 
teins, pigments, glycoproteins, lipids and polysaccharides) 
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which are secreted by an organ known as the mantle. The 
anterior edge of the mantle underlies the lip of the shell 
and directs the ordered biomineralization of the different 
structural layers of the shell and also controls the de- 
position of pigment features. With advances in nucleic 
acid sequencing technologies and proteomic methods, 
the close to complete shell-forming proteomes of several 
molluscs have now been reported [4-7]. Several proteins 
from these collections have been more fully characterized 
[8-11]. However, the vast majority of these previous studies 
are focused on marine species. While Pavat et al. [12] re- 
cently reported the biochemical properties of the shell 
forming proteome of the pulmonate Helix aspersa maxima, 
the lack of any transcriptome or genome data for this 
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species limited their proteomic analyses. They were able 
to characterize nine distinct 2D spots, of which seven 
returned a total of 14 peptides ranging in length from 4 - 
11 residues. A full proteome-scale dataset from a pulmon- 
ate gastropod would efficiently highlight the conserved and 
lineage specific molecular mechanisms of molluscan shell 
formation, and would provide deep insight into how these 
proteomes evolve. This is because marine and terrestrial 
shell-forming molluscs have adapted to significantly differ- 
ent environments that would fundamentally affect both the 
process of shell formation, and the stability of the secreted 
composite biomineral; e.g. the abundance and biological 
availability of calcium, environmental pH, temperature, UV 
radiation, humidity and so on. While proteins involved in 
the process of shell formation in different molluscan line- 
ages could be expected to have evolved in response to 
these different selective pressures, the signature of an an- 
cestral shell-forming program may still be recognizable. 

The common grove snail, Cepaea nemoralis (Figure 1), 
has long been studied by ecologists and population ge- 
neticists [13-15]. Key insights were gained during the 
1950s and 1960s when it was demonstrated that vari- 
ation in pigmented shell traits of Cepaea are inherited in 
a Mendelian fashion [16,17]. Furthermore, the frequen- 
cies of these morphotypes have been suggested to be in- 
fluenced by two agents of natural selection: predation 
by birds [18,19], and climatic conditions [20]. Despite 



this long history of research concerning the variable pig- 
mentation of the Cepaea shell, there is only one study 
that has aimed to specifically identify the genes that 
control this morphological variety [21]. Using RAD-Seq 
(Restriction Site Associated-Sequencing) Richards et al. re- 
cently identified 44 anonymous markers putatively linked 
to loci that control the shell ground color and the presence 
or absence of dark brown bands on the Cepaea shell. Yet 
despite the association of carotenoids, porphyrins, carbo- 
hydrates and polyenes with some molluscan shell pigments 
[22-25], there exists no example of a complete molecular 
understanding of any shell pigmentation mechanism in 
any mollusc. In one case, the molecular basis of a mollus- 
can shell pigment been partially elucidated. One of us 
previously demonstrated that the protein Sometsuke is 
directly associated with the blue and red pigmentation of 
the juvenile shell of the tropical abalone Haliotis asinina 
[4]. This previous finding motivated us to search for 
proteins associated with the various pigments within 
the C. nemoralis shell using high throughput transcrip- 
tomic and proteomic methods. This effort has allowed 
us to assemble a dataset that is likely to represent the 
majority of the shell forming proteome of C. nemoralis. 
This in-depth proteome is the first to be reported from 
a pulmonate gastropod, and also allows us to conduct 
comparisons between it and several others recently re- 
ported shell-forming proteomes from marine species. 




Figure 1 Representative polymorpiiic siiells of C. nemoralis surveyed for thieir protein contents. A-C. Examples of the three main shell 
types we surveyed for both shell-forming proteins and protein-associated pigments D. In order to identify pigment-associated proteins C. nemoralis 
shells were crushed and divided into one of three pigmented fractions for subsequent proteomic analyses. With this approach, differentially localized 
proteins would be visible on LDS-PAGE gels. 
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Results and discussion 

General character of the C. nemoralis shell-forming proteome 

We were able to retrieve a total of 553 proteins/protein 
groups from the shell proteome of C nemoralis using two 
different preparative techniques (including or excluding a 
sodium hypochlorite-plus-sonication pretreatment step). 
Shell material that had not been washed yielded 418 pro- 
teins, while shell material washed with sodium hypochlorite 
and sonication yielded 525 proteins. 382 proteins were 
present in both of these datasets. A list of accepted identifi- 
cations is provided in Additional file 1. The MaxQuant out- 
put files containing relevant parameters, such as scores, 
sequence coverage and number of peptides are provided 
for proteins/protein groups in Additional file 2 (matrix ex- 
tracted without hypochlorite pre-treatment) and Additional 
file 3 (with hypochlorite pre-treatment). Additional file 4 
(without hypochlorite treatment) and Additional file 5 (with 
hypochlorite treatment) contain the MaxQuant output files 
for peptides. All MaxQuant output files also contain pro- 
teins rejected after manual validation of the results, and are 
therefore not included in the list of accepted identifications 
in Additional file 1. An iBAQ estimate of protein abun- 
dance suggests that 59 proteins/protein groups constitute 
more than 93% of the total Cepaea shell proteome (Table 1). 
This is a conservative collection of proteins and peptides 
that met or exceeded stringent bioinformatic and statistical 
criteria (see Methods). With fiirther work (for example 
more complete Cepaea transcriptomic and/or genomic 
datasets against which to query the MS data) this estimate 
will likely increase. It has been reported for corals that 
in order to reduce the presence of contaminating non- 
biomineralizing proteins, an extensive sodium hypo- 
chlorite cleaning step must be performed on the finely 
powdered coral biomineral [26]. If this is not done then 
abundant cytoskeletal proteins such as actins, tubuUns 
and myosins (which are unlikely to be directly involved 
in biomineralization) will carry through into the final 
biomineralization dataset [27]. While we observed subtle 
differences in the LDS-PAGE profiles of proteins derived 
from C. nemoralis shell material prepared with and with- 
out a hypochlorite pre-treatment (Figure 2), the proteomic 
data generated by the two methods were not fundamen- 
tally different. 

The overall composition of the C. nemoralis shell prote- 
ome is dominated by uncharacterized and/or novel pro- 
teins (Table 1). Indeed the four most abundant proteins of 
the identifiable C. nemoralis proteome (constituting >52% 
of the total shell proteome) did not share significant simi- 
larity with any entries in UniProt (Table 1). Furthermore, 
31 out of the 59 distinct proteins (52.5% of all proteins 
which, by abundance, account for 80% of the identifiable 
proteome) did not return hits against UniProt. This largely 
unique proteome reflects the situation for the majority of 
previously reported molluscan shell forming proteomes. 



and highlights the need for the development of reliable 
and repeatable in vivo functional assays in these systems. 
The most abundant protein in the C. nemoralis shell, 
accounting for more than 25% of the total identifiable 
proteinaceous material, was isotig_123 (Table 1). This 
Gly- and Pro-rich sequence did not share any BLASTp 
similarity with proteins in UniProt or Refseq, and no 
domains could be identified by Pfam or HMM searches. 
The two most abundant C. nemoralis shell proteins to 
share similarity with other proteins were isotig_2668 and 
isotig_821 (Table 1). C. nemoralis isotig_2668 (accounting 
for 3.87% of the shell proteome) shares significant similar- 
ity with a human poly-domain protein named SEL-OB/ 
SVEPl. This human protein is present in osteogenic tis- 
sues and plays roles in cell adhesion [28]. Isotig_821 pos- 
sesses a chitin-binding peritrophin-A domain, and shares 
similarity with several other molluscan proteins including 
BSMP (Blue Mussel Shell Protein) from P. vulgata and 
a large multi-domain containing protein from C. gigas 
(Sushi, von Willebrand, EGF and chitin-binding domains). 
Conspicuous in their abundance were several other C. 
nemoralis proteins that also possessed other recognizable 
chitin-interacting domains (14003, 1323, 101824, 84589, 
63304, 248122 and 31170), suggesting that chitin is an im- 
portant organic component of the C. nemoralis shell. 

Many of the proteins identified in our proteomic ana- 
lysis possess unusually high proportions of certain amino 
acids. These types of proteins are often found in mollus- 
can biominerals [5,29,30]. For example, Lustrin (a pro- 
tein isolated from the nacreous layer of the Californian 
red abalone Haliotis rufescens) contains a 272-residue do- 
main rich in Gly (31%) and Ser (61%) residues. This do- 
main has been suggested to act as an extensor molecule 
and to impart fracture resistance to the abalone shell [10]. 
However, it must pointed out that this function is yet to 
be experimentally verified. While we cannot assign func- 
tions to any of the various C. nemoralis proteins that con- 
tain domains rich in amino acids such as Gin, Asp, Pro, 
Gly and Met, their abundance and diversity in this dataset 
hints at the important role they must play in shell forma- 
tion. Interestingly, one of these proteins (isotig_7807), 
which consists of 13% Ala and 11% Gin, also contains 
a Whey Acidic Protein (WAP domain), which is also 
present in the abalone Lustrin proteins [10,31]. WAP do- 
mains are thought to possess protease inhibitor activity 
due to the presence of 4-disulphide core (4-DSC) residues, 
which serine protease inhibitors also possess [32]. A di- 
versity of protease inhibitor domain-containing proteins 
has been observed in other molluscan shell forming 
proteomes [6,29,33]. The presence of protease inhibitors 
in an external structure such as the molluscan shell may 
provide the shell with an ability to resist the digestive 
enzymes secreted by fouling organisms and predators 
that would dissolve and bore through the shell, for 



Table 1 The major proteins and peptides of the C. nemoralis shell: 59 proteins and peptides (with an iBAQ percentage of more than 0.1) constitute 93% of the 
identifiable C. nemoralis shell proteome 
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11 
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J7Q5J6 
Potello vulgoto 


8.7e-l 1 


29.9% 


Similar to BSMP; Domains: CBM_14/ CHIT_BINDJI 


S<l 


15 


2.45 


4164 


None 






Similarity to UP2_HALAI (e-value 0.19; 28.1% identity) 


S>l 


3 


2.07 


58150 


K1QZ49 
Crossostreo gigos 


1 .2e-40 


35.1% 


Similar to adipocyte plasma membrane-associated protein; 
Domains: TolBJike/ strictosidine synthase; see also 
contig_221 and contig_16710; SP 


S<l 


16 


1.98 


5087 


None 
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35852 
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7809 


None 
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S < 1 
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0.83 
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J7QJT8 
Patella vulgata 


6.3e-25 


34.7% 


Domains: aCA; SP 


S < 1 


42 


0.59 


3938 


None 






SP 


S>l 
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0.58 


1188 + 4282 


K1P9P0 
Crassostrea gig as 


1 .2e-27 
4.4e-21 


47.0% 
50.9% 


Mesenchyme-specific cell surface 
glycoprotein; Domains:WD40/Y\^N repeat-like 


S<l 


15 


0.53 


7563 


None 
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None 






(11% A, 11%G, 12% M) 
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None 






(31% Q, 23% P) 


S>l 
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0.48 


269 


None 






(18% G) 


S<l 


8 


0.44 


14003 


K1PRD3 
Crossostreo gig as 


3.6e-22 


30.1% 


Similar to IgGFc-binding protein; Domains: CBM_ 
14/CHIT_BINDJI; shares peptide with 101824 


S,l 


18 


0.38 


1647 


None 


- 


- 


Domains:Sushi/SCR/CCP; (12% P, 11% S, 1 1% T) 


S>l 


9 


0.38 


450 


68CYM6 
Phy sella acuta 


9.9e-67 


74.2% 


G-type lysozyme;SP 


S<l 


5 


0.36 


1323 


A7T0W4 
Nematostella vectensis 


1 .3e-26 


40.5% 


Domains: polysaccharide deacetylase/chitinase 


S<l 


7 


0.32 


101824 


K1QJK2 
Crassostrea gig as 


2.1e-22 


29.4% 


Domains: CBM_14/ CHIT_BINDJI 


S<l 


29 


0.31 


84589 


K1QIK2 
Crassostrea gig as 


5.4e-41 


29.2% 


Domains: CBM_14/ CHIT_BINDJI; also see contig_101824 


S<l 


29 


0.30 


132 


None 






(12% Q) 


S<l 


21 


0.29 


28994* 


K1QPM9 
Crassostrea gig as 


1 .5e-22 


45.6% 


Similar to fatty acid-binding protein, brain 


s,i 


7 


0.24 


32297 


None 


- 


- 


(14% G, 10% P); SP 


S>l 


2 


0.23 


74063 


None 






(10% G, 13% L, 20% P); SP 


S,l 


1 


0.22 


263 


None 


- 


- 




S<l 


1 


0.21 


63304 


K1Q365 
Crassostrea gig as 


9.6e-55 


29.8% 


AA 127-818 similar to lactadherin; Domains: CBM_14/CHIT_BINDJI 


S<l 


26 


0.21 


227 


D3BGG3 
Polysphondylium pallidum 


8.1 e-5 


23.9% 


Similar to Zipper-like Domains-containing protein 


S<l 


14 


0.20 


691 


A5Z1D6 
Cernuella virgata 


1.4e-89 


43.3% 


Similar to epiphragmin; Domains: Fibr_C; SP 


S<l 


31 


0.20 


2357 


None 


- 


- 


AA 395-528 similar to Domains: LDL_recept_a; (15% J); SP 


S,l 


12 


0.19 


572* 


IFEA 

Helix aspersa 


1.7e-118 


96.9% 


Non-neuronal cytoplasmic intermediate filament protein 


S<l 


32 


0.19 


7323 


A7RQD5 
Nematostella vectensis 


6.2e-41 


32.2% 


SP 


S<l 


25 


0.19 


3883 


K1Q9V3 
Crassostrea gig as 


5.9e-190 


86.2% 


V-type proton ATPase catalytic subunit A 


S<l 


27 


0.17 


943* 


G0ZGZ8 


5.2e-38 


97.8% 


Actin 


S<l 


3 


0.16 
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R7VB66 

f~ nnifplln tplntn 

V^UfJI LCI lU LCIULU 


3.0e-31 


54.5% 


SP 


S<l 


6 


0.16 


104312 


R7TB34 

r~ nnifplln tplntn 

\^(J[JI Lcl lU LclULU 


2.4e-5 


45.5% 


Similar to a-carbonic aniiydrase; Domains: a-CA_2 


S,l 


4 


0.14 


196388 








f 1 7% A 1 1 % G 11 % P 1 4% SV 

\ \ z. /u / \, 1 1 /u \Ji 1 1 /u r , \ ^ /u jjji 


S < 1 


1 


0.14 


2858 


H2ZUY5 
Lotimcrio cholumnoG 


5.5e-41 


44.6% 


Similar to adipocyte plasma 
membrane-associated protein; Domains: 
TolB_like/strictosidine:synthase_related 


S<l 


11 


0.14 


248122 


C3Z1I6 
Branchiostoma floridae 


6.1e-15 


38.8% 


Similar to chitinase; Domains: glyco_hydro_18/chitinase 


S<l 


4 


0.13 


7807 


None 


- 


- 


Domains: WAP; (12% A, 11% Q); SP 


S<l 


15 


0.12 


1237 


K1QI28 
Crossostreo gigos 


1.9e-191 


89.8% 


V-type proton ATPase subunit B 


S<l 


18 


0.11 


20308 


Q2LZN0 
Drosophila pseudoobscura 


1 .6e-9 


27.0% 


AA 49-467 similar to Dpse/GA10422/alkaline 
phosphatase; Domains: alkaline phosphatase; SP 


S< 1 


14 


0.11 


20360 


H9K6W1 
Apis mellifero 


3.9e-37 


37.5% 


Similar to cadherin; Domains: cadherin; see also contig_75801 


S<l 


8 


0.11 


31170 


K7S108 

Propionibacterium acidipropionici 


3.2e-37 


35.0% 


Domains: CMB_14/CHIT_BINDJI; SP 


S<l 


17 


0.11 



All proteins listed here were found in a!! pigment fractions (yellow, orange and dark brown) with the exception of isotig_1 69764 which was only found in yellow and orange fractions. Proteins and peptides are listed 
in order of abundance expressed as a percentage of the shell total proteome that we can identify. 

^The percentages of particularly abundant amino acid residues are given in brackets. "SP" indicates the presence of a signal peptide, "TM" indicates a likely trans-membrane protein. 
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Figure 2 A comparison of the protein-associated pigments and PAGE profiles of proteins isolated from the shells of H. asinina and 
C nemoralis. A. A representative SDS-PAGE gel of proteins isolated from shells of juvenile H. osinino. This gel is unstained (the protein marker is 
pre-stained) allowing the red and blue pigment-associated proteins to be visualized (arrows). B. A representative LDS-PAGE analysis of C. nemorolis 
acid-soluble and -insoluble proteins extracted from shell fragments either treated with or without a sodium hypochlorite-plus-sonication pre-treatment. 
Protein extractions which displayed such electrophoretic patterns were subjected to FASP (Filter-Aided Sample Preparation) sample preparation and 
proteomic analysis. 



example natacid gastropods [34] polydorid annelids [35] 
and sponges [36]. 

We also detected a neurofilament protein (isotig_572) 
present in the C. nemoralis shell. While this may at first 
be considered a non-biomineral associated contaminant, 
a similar protein was reported from the shells of Helix 
aspersa [12], and we also note the presence of such a 
protein in the shell proteome of L gigantea (see below). 
The presence of such presumably intra-cellular proteins 
in extra-cellular structures such as the molluscan shell 
are difficult to reconcile with our current understanding 
of how such biominerals are formed, and serve to high- 
light how far we are from a complete understanding of 
these processes. While a conventional model of shell for- 
mation would account for the presence of such proteins 
through the non-specific occlusion of cells and cellular 
debris into a growing face of a biomineral, there are 
alternative models that should perhaps be considered. 
The biophysical properties of filament proteins have been 
well studied, and they are known to be able to reversibly 
deform to several times their own length [37-39]. The 



fracture resistance properties of the molluscan shell, 
which exceeds that of pure CaCOs by several orders of 
magnitude, is imparted to the biomineral by the organic 
components of the shell. A non-canonical secretory path- 
way for filament proteins, or the specific integration of 
filament-rich cells into the growing shell may be a mech- 
anism by which the shell acquires such biomechanical 
properties. However, such hypotheses require further ex- 
perimental investigation. 

A previously described molluscan shell matrix protein, 
dermatopontin, was suggested to be the major protein- 
aceous component of the shell of the freshwater snail 
Biomphalaria glabrata [40,41]. Dermatopontin was also 
reported from the shells of other gastropods [42], and 
bivalves, where it is thought to play a role in nacre for- 
mation [43], and can also be found in taxa ranging from 
bacteria to humans. Interestingly, we did not detect Der- 
matopontin in the shell of C. nemoralis. To investigate 
this further we constructed an HMM profile of mollus- 
can Dermatopontin proteins and used a local installation 
of HMMsearch [44] to query this against our translated 
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C. nemoralis transcriptome. This search returned two sig- 
nificant hits (contig_46837 e-value 6.8e-27; contig_162693 
e-value 5e-06). These contigs contain clear Dermatopon- 
tin domains, with contig_46837 also possessing a signal 
sequence. This discrepancy between the presence of Der- 
matopontin transcripts in the C. nemoralis mantle tran- 
scriptome, and the absence of Dermatopontin proteins in 
the shell-proteome may be a technical artifact, or a bio- 
logical reality. While the diverse technical challenges of 
working with samples such as biominerals make the first 
possibility likely, the second scenario should also be con- 
sidered in light of the apparent evolvability of molluscan 
shell forming secretomes [4,29,33]. Shell-forming genes 
under recent negative selection pressures could conceivably 
still be transcribed, but not translated or actively involved 
in shell formation. However, such a scenario requires 
additional investigation. 

While a high proportion of the 59 proteins identified in 
the shell of C. nemoralis did not share any BLAST similar- 
ity with proteins in UniProt, some of them did contain do- 
mains that could be recognized by HMM searches. These 
are indicated in column 5 of Table 1. Many of these were 
hits were against "uncharacterized" domains or proteins of 
unknown function. In some cases trans-membrane (TM) 
regions could be identified. Such a finding is interesting as 
it leads to the question of how could such membrane- 
embedded proteins be located within the mature biomin- 
eral. Such a finding was also recently reported in a dataset 
of coral biomineralizing proteins [45] and several other 
studies [46,47]. In the coral study of Ramos-Silva et al. the 
majority of the MS peptides identified in the biomineral 
could only be matched against the extra- cellular regions 
of trans-membrane proteins, suggesting that these extra- 
cellular domains are specifically cleaved from the trans- 
membrane portions of TM proteins. We observe a similar 
phenomenon in our C. nemoralis data. Seventy-two pep- 
tides were observed in the MS data for isotig_5087, of 
these 70 were located in the putative extra-cellular domain 
of the protein (Additional file 6). The role that trans- 
membrane proteins play in molluscan shell-formation has 
thus far received little attention. 

Pigmentation of the C. nemoralis shell is not directly 
associated with a proteinaceous component 

Previously, one of us described the Sometsuke protein 
from the shell of Haliotis asinina [4]. This protein is 
most likely coupled to a chromophore, which is involved 
in imparting both the red, and blue colors to the juvenile 
abalone shell (Figure 2A), and is perhaps the currently 
best understood molluscan shell-pigmentation mechanism 
at the genetic level. One of our primary motivations for 
the current work was to determine whether C. nemoralis 
also uses a protein-associated pigmentation mechanism 
to pattern its shell, and if so, to identify those proteins. 



Multiple protein extractions from a variety of C. nemor- 
alis shells, including protocols without the potentially 
destructive washing with hypochlorite, suggested that 
this was not the case. Pigmented LDS-PAGE bands (as 
per Sometsuke from H, asinina) were never observed 
(Figure 2B). A dark brown material, which consistently 
accompanied extractions from dark brown shell frac- 
tions, remained predominantly in the PAGE sample 
buffer-insoluble material that was removed by centri- 
fugation before electrophoresis in order to obtain clear 
and comparable LDS-PAGE electropherograms. While it 
could be argued that an insoluble C. nemoralis pigment- 
associated protein would need to be rendered soluble in 
order for it to be visualized on a gel, we point out that 
the denaturation treatments applied to the C. nemoralis 
samples (70°C for 10 minutes in detergent-containing 
loading buffer with mercaptoethanol) were more than 
adequate to solubilize the water-insoluble Sometsuke 
protein (Figure 2A). 

Unfortunately we were unable to relatively quantify 
the proteins associated with each of the three pigment 
classes using a MaxQuant-implementation of label-free 
quantitation (LFQ) due to the significantly different solu- 
bility behavior of the protein extracts (see below). How- 
ever, a second line of evidence suggests that C. nemoralis 
shell pigments are not associated with proteins. A qualita- 
tive assessment of the 59 proteins extracted from the shell 
(derived from three different pigment fractions) reveals 
that 58 were present in all three pigment fractions. These 
58 proteins (which passed our stringent quality filters) ac- 
count for >93% of the identifiable proteome. If we assume 
that a C. nemoralis pigment-associated protein would be 
at least moderately differentially abundant between the 
three pigment fractions (see Figure ID, and as is certainly 
the case for Sometsuke in H, asinina, see [4], such a pro- 
tein should either be easily observable on SDS/LDS-PAGE 
gels, or qualitatively differentially distributed across the 
three pigment fractions. The only protein to be qualita- 
tively differentially distributed across the three pigment 
fractions was isotig_l 69764. This sequence was only de- 
tected in yellow and orange fractions, and is apparently 
highly conserved as it shares significant similarity with 
proteins in organisms ranging from bacteria (lOe-23) 
and green algae (2e-05) to hemichordates (7e-34) and 
segmented worms (6e-46). Despite this conservation, there 
are no recognizable functional domains in the C. nemoralis 
orthologue of this protein. It was also relatively rare at just 
0.16% of the total identifiable proteome, suggesting that it 
is unlikely to be directly involved in pigmenting the shell. 
Considering all of these points, our favored interpretation 
is that C. nemoralis shell pigments are not associated 
with proteins, and most likely have no homology with 
the Sometsuke pigmentation mechanism employed by 
H, asinina. If correct, this indicates that molluscan shell 
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pigmentation mechanisms may have multiple evolutionary 
origins. However this scenario requires further investigation 
and experimental evidence. 

The LDS-PAGE profiles of acid-soluble vs. acid-insoluble 
proteins isolated from the three different C. nemoralis 
shell-pigment fractions differed significantly (Figure 3). 
Essentially, the profiles of soluble proteins derived from 
Yellow and Orange factions were similar to each other, 
and were in general more abundant and heterogeneous 



than the soluble proteins isolated from the Dark Brown 
fraction. In contrast, the acid-insoluble proteins isolated 
from the Dark Brown fraction were more abundant than 
those isolated from the Yellow or Orange fractions. Des- 
pite this difference, the prominent bands present in 
yellow-soluble, orange-soluble and brown-insoluble ap- 
pear to be largely similar (Figure 3). While we cannot 
explain this observation, it is clear that the biochemical 
properties of the proteins present in the brown fraction 
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Figure 3 Representative LDS-PAGE analyses of soluble and insoluble proteins derived from three different C nemoralis shell-pigment 
fractions (yellow, orange and brown). A. Soluble proteins. B. Insoluble proteins. The most striking difference between these fractions was the 
abundance of the proteins present in the yellow-soluble and orange-soluble fractions relative to the brown-soluble fraction. This pattern is reversed in 
the acid-insoluble fractions. 
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are qualitatively different from those in the yellow and or- 
ange fractions. This may be the result of post-translational 
modifications that affect the solubility of each protein 
fraction, such as different degrees of cross-linking. How- 
ever, this idea requires further investigation. 

Comparisons of molluscan shell forming proteomes 

With the C. nemoralis shell proteome in hand we were 
able to conduct broad level comparisons against five other 
molluscan shell forming proteomes. Importantly, all six of 
these datasets are not transcriptome (RNA) -based datasets 
but are primarily composed of proteins that have been iso- 
lated from the shells of the respective species (mapped back 
to either RNASeq scale mantle transcriptomes or genome 
assemblies), and therefore are lil<ely to be somehow directly 
involved in shell formation. To our knowledge, this is the 
first time such a proteome level comparison of molluscan 
shell forming proteins has been made. 

Of the 59 proteins we isolated from the C. nemoralis 
shell, 28 (47.5%) shared similarity (at an e-value thresh- 
old of lOe"^) with one or more proteins derived from 



the five other molluscan shell proteomes we investigated 
here (Figure 4). Interestingly only one C. nemoralis pro- 
tein shared similarity with any of the 94//. asinina shell 
forming proteins. This single protein shares no signifi- 
cant similarity with any proteins in public databases and 
contains no identifiable conserved domains. It was pre- 
viously reported that the shell forming proteome of 
H, asinina is highly divergent from other such molluscan 
proteomes, and that this could be interpreted as evidence 
of a rapidly evolving shell-forming secretome [4]. Given 
that C. nemoralis and H, asinina share more recent com- 
mon ancestry than C. nemoralis does with any of the three 
bivalves investigated here (all three of which include more 
proteins with similarity to the C. nemoralis proteome) the 
result we report here appears to support that hypothesis. 

Several C. nemoralis shell proteins displayed extremely 
high similarity with other molluscan shell forming proteins. 
The four C. nemoralis proteins to share the highest similar- 
ity with any other species were all shared with L gigantea. 
In order of similarity these were: isotig_572 (a filament 
protein - see discussion above) at an e-value of 4e-143 




Top quartile of global similarity 
3rd quartile of global similarity 
M 2nd quartile of global similarity 

Lowest quartile of global similarity 



Threshold e-value = 10e"^ 

Figure 4 BLASTp comparisons of the C nemoralis shell proteome against the shell proteomes derived from 3 bivalves and 2 gastropods. 

Individual lines spanning the ideogram connect proteins that share significant similarity (e values < lOe-6). Transparent red lines connect proteins with 
the lowest quartile of similarity (with a threshold of lOe-6) and green lines with the highest quartile of similarity. The percentage of each shell proteome 
that shared similarity with the C. nemoralis proteome is provided. Shell proteome datasets were derived from the following publications: P. maxima from 
[30]; P. margaritifera from [30]; H. asinina from [6] and [4]; L gigantea from [33] and [5]; C. gigas from [7]. 
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(green link in Figure 4); isotig_84589 (a chitin-binding 
domain-containing protein - see below) at an e-value of 
3e-93 (green link in Figure 4); isotig_63304 (another 
chitin-binding domain- containing protein) at an e-value of 
2e-90 (blue link in Figure 4); and isotig_25891 (carbonic 
anhydrase) at an e-value of 6e-74 (blue link in Figure 4). In 
order to identify deeply conserved molluscan shell forming 
proteins, we ordered all C. nemoralis proteins that were 
found in any other shell proteome according to the num- 
ber of databases they were found in (Additional file 7). This 
revealed two proteins that were found in four of the five 
molluscan shell proteomes: Cnem821 and Cnem248122. 
Cnem821 possesses a chitin-binding Periotrophin-A do- 
main (Carbohydrate Binding Module 14: Pfam PF01607). 
The second protein found in four of the five molluscan 
shell proteomes (Cnem248122) shared significant similar- 
ity with chitinase proteins in Swissprot. These results 
emphasize the prominent role that chitin is likely to play 
in the construction of disparate molluscan shells and in- 
deed in many metazoan biominerals [48-52] . 

Other conserved C. nemoralis shell forming proteins 
of interest include two carbonic anhydrase domain- 
containing proteins (Cnem25891 and Cneml04312) and 
two V-ATPase subunits (Cnem3883 and Cneml237). V- 
ATPases have not previously been shown to play a role 
in molluscan shell formation (beyond their presence in 
mantle EST or RNASeq datasets), however it could be 
expected that their ability to transport across mem- 
branes would afford them a central role in the regulation 
of shell formation. Indeed such proton pumps are known 
to play roles in the calcification of a variety of metazoan 
biominerals [53-55]. While one of the carbonic anhydrase 
domain-containing isotigs (Cneml04312) is clearly incom- 
plete, the other (Cnem25891) is potentially complete and 
encodes a protein of 1,028 amino acids (the corresponding 
isotig contains 3,859 bp). This protein contains a signal se- 
quence, a carbonic anhydrase domain with phylogenetic 
affinity to the secreted and membrane bound a-CAs (see 
Additional file 8 for a phylogenetic analysis) and a carb- 
oxyl region of relatively low complexity (Additional file 9). 
The shell-forming Nacrein proteins also contain carbonic 
anhydrase domains and have been previously described 
from gastropod and bivalve shells [56,57]. Interestingly the 
CA-domain in the C. nemoralis protein is not interrupted 
by the low- complexity region as it is in the Nacreins, and 
the C. nemoralis low complexity domain is composed of 
Gin residues rather than Gly and Asn residues (Additional 
file 9). The hydropathy profile of all of these proteins dis- 
play similar characteristics (Additional file 10), and sug- 
gests that the low complexity domains interact with the 
water-soluble phases of the biomineralization processes. 
Miyamoto et al. [58] reported that a recombinant Nacrein 
protein inhibited the precipitation of CaCOs in in vitro 
calcification assays, and that removal of the repetitive 



low-complexity domain attenuated this inhibitory activ- 
ity. While the results of such in vitro calcification assays 
should always be interpreted with caution, this result in- 
dicates that the low complexity domains of molluscan 
shell forming a-CAs have a significant impact on the ac- 
tivity of the enzyme to which they are fused or inserted. 
The fact that the a-CA we have identified here has a sig- 
nificantly different domain of low-complexity from pre- 
viously described shell-associated a-CAs suggests that 
this gene may have a different evolutionary history. 

Conclusions 

The 59 shell-associated proteins we report here repre- 
sent the largest collection of proteins from a pulmonate 
shell to date. The abundant proteins in this dataset either 
display no similarity to known proteins, or similarity to 
uncharacterized proteins. Comparisons of this dataset with 
other molluscan shell-forming proteomes indicate that al- 
most half of the C. nemoralis shell-forming proteome we 
describe here shares similarity with the shell-forming com- 
ponents of other molluscs. Two lines of experimental evi- 
dence failed to identify the presence of pigment-associated 
proteins in the C. nemoralis shell. Considering the clear as- 
sociation between a protein and a pigment in the juvenile 
Haliotis shell, this finding indicates that molluscan shell 
pigmentation mechanisms may have diverse evolutionary 
origins. This dataset will serve as an important platform 
from which further studies aimed at the characterization of 
pulmonate shell forming and pigmenting genes can be 
performed. 

Methods 

Generation of a reference C. nemoralis mantle 
transcriptome for proteomic surveys 

Seven total RNA extractions derived from the distal- 
most edge (i.e. the shell forming region) of the mantle 
tissue of 4 juvenile C. nemoralis individuals (two extrac- 
tions from each of three individuals for Illumina sequen- 
cing and one extraction from a fourth individual for 454 
sequencing) was extracted using Trizol according to the 
manufacturer s instructions. These total RNA extractions 
were sequenced on the Illumina HiSeq2000 and Roche 
454 platforms and assembled using the CLC Genomics 
workbench (version 5.5.2). This assembly generated a total 
of 676,358 contigs >100 bp, summing to a total of 193,892, 
905 bp (max contig size = 14,765 bp, median contig size = 
180 bp). In order to reduce the redundancy of this assembly 
for proteomic interrogation (see below) the following steps 
were applied. First, contigs shorter than 500 bp were re- 
moved. All remaining contigs were then clustered into 
isotigs using "usearch" [59]. The longest possible open 
reading frame (which was required to be >50 amino 
acids) was then extracted from each isotig using stand- 
ard Perl scripts. These translated ORFs were then 
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clustered again using "usearch" to produce a total of 
55,623 putative coding translated fragments. This data- 
set was used to conduct the proteomic surveys (see 
below). All assembled nucleic acid sequences discussed 
in this work are available in Additional file 11. 

Preparation of shell matrix and peptides 

Shells from approximately 100 freshly collected snails 
were crushed into approximately 5 mm^ fragments and 
carefully sorted into three populations: dark brown (20.5 g 
derived from shells with a yellow background); yellow 
(23.7 g); and orange (12.3 g; see Figure 1). The shell pieces 
were either briefly washed with deionized water, or 
with sodium hypochlorite (6-14% active chlorine; Merck, 
Darmstadt, Germany) for 2 h at room temperature with 
5 min sonication and change of hypochlorite solution every 
30 min. Hypochlorite-treated shells were then washed with 
deionized water and dried. Washed shell pieces were demi- 
neralized in 50% acetic acid (20 mL/g of shell) for 16 h at 
4°C. The resulting suspensions were dialyzed successively 
against 3 x lOvol 10% acetic acid and 2 x lOvol 5% acetic 
acid at 4-6°C (Spectra/Por 6 dialysis membrane, mw cut- 
off 2000; Spectrum Europe, Breda, The Netherlands). The 
dialyzed suspensions were centrifuged for 1 h at 4°C and 
12000gav. Pellets and supernatants were lyophilized and an- 
alyzed separately. 

LDS-PAGE was performed with pre-cast 4-12% gradi- 
ent Novex Bis-Tris gels in MES buffer using reagents 
and protocols supplied by the manufacturer (Invitrogen, 
Carlsbad, CA). The sample buffer was complemented 
with |3-mercaptoethanol to a final concentration of 1% 
and samples were heated to 70°C for 10 min. Sample 
buffer-insoluble material was removed by centrifugation 
at 16000 g for 5 min. Gels were loaded with the soluble 
fraction at 200 [ig matrix/lane. 

Reduction, carbamidomethylation and enzymatic cleav- 
age of matrix proteins were performed using a modifica- 
tion of the FASP (Filter-aided sample preparation) method 
[60]. Aliquots of 200 [ig of acid-soluble or acid-insoluble 
shell matrix were suspended in 200 [A of 0.1 M Tris, pH8, 
containing 6 M guanidine hydrochloride and 0.01 M di- 
thiothreitol (DTT). This mixture was heated to 56°C for 
60 min, cooled to room temperature, and centrifuged at 
13000 rpm in an Eppendorf bench top centrifuge 541 5D 
for 15 min. The supernatant was loaded into an Amicon 
Ultra 0.5 ml 30 K filter device (Millipore; TuUagreen, 
Ireland). DTT was removed by centrifugation at 13000 rpm 
for 15 min and washing with 2 x Ivol of the same buffer. 
Carbamidomethylation was done in the device using 0.1 M 
Tris buffer, pH8, containing 6 M-guanidine hydrochloride 
and 0.05 mM iodoacetamide and incubation for 45 min in 
the dark. Carbamidomethylated proteins were washed with 
0.05 M ammonium hydrogen carbonate buffer, pH8, con- 
taining 6 M urea, and centrifugation as before. Each sample 



was then incubated with 2 \ig of Lysyl endopeptidase 
(WAKO Chemicals GmbH, Neuss, Germany) in 40 [A of 
0.05 mM ammonium hydrogen carbonate buffer containing 
6 M urea for 6 h at 37°C. This was followed by addition of 
4 [ig of trypsin (Sequencing grade, modified; Promega, 
Madison, USA) in 80 [A of 0.05 M ammonium hydrogen 
carbonate buffer and further incubation at 37°C for 16 h. 
Peptides were collected by centrifugation and the filters 
were washed twice with 40 [A of 0.05 M ammonium hydro- 
gen carbonate buffer. The peptide solutions were acidified 
to pH ~ 1 with trifluoroacetic acid and desalted for mass 
spectrometric analysis with C18 Stage Tips [61]. 

LC-MS analysis 

Peptide mixtures were analyzed by on-line nanoflow liquid 
chromatography using the EASY-nLC 1000 system (Prox- 
eon Biosystems, Odense, Denmark, now part of Thermo 
Fisher Scientific) with 20 cm capillary columns of an in- 
ternal diameter of 75 [im filled with 1.8 [im Reprosil-Pur 
C18-AQ resin (Dr. Maisch GmbH, Ammerbuch-Entringen, 
Germany). Peptides were eluted with a linear gradient from 
5-30% buffer B (80% acetonitrile in 0.1% formic acid) for 
100 min, 30-60% B for 12 min and 80-95% B for 8 min 
at a flow rate of 250 nl/min. The eluate was electro- 
sprayed into an Orbitrap Elite (Thermo Fisher Scientific, 
Bremen, Germany) using a Proxeon nanoelectrospray ion 
source. The Orbitrap Elite was operated essentially as pre- 
viously described [62] in a HCD top 10 mode with dy- 
namic selection of the 10 most intense peaks of each 
survey scan (300-1750Th) for fragmentation. The reso- 
lution was 120,000 for full scans and 15,000 for fragments 
(both specified at m/z 400). Ion target values were le6 and 
5e4ms, respectively. Dynamic exclusion time was 30 sec. 

Analysis of proteomic data 

Raw files were processed using the Andromeda search 
engine-based version 1.3.9.21 of MaxQuant (http://www. 
maxquant.org/) with "second peptide" enabled, iBAQ 
(intensity-based absolute quantitation; [63] as implemented 
in recent versions of MaxQuant, and "match between 
runs" options (match time window 0.5 min; alignment time 
window 20 min) [64-66]. For protein identification the 
C. nemoralis mantle transcriptome database (see above) 
was converted into a FAS TA-for matted protein sequence 
database, downloaded into MaxQuant, and automatically 
combined with the reversed sequences and sequences of 
common contaminants, such as human keratins. Carba- 
midomethylation was set as fixed modification. Variable 
modifications were oxidation (M), N-acetyl (protein), 
pyro-Glu/Gln (N-term) and phospho (STY). The initial 
mass tolerance for full scans was 7 ppm and 20 ppm for 
MS/MS. Two missed cleavages were allowed and the min- 
imal length required for a peptide was seven amino acids. 
The peptide and protein false discovery rates (FDR) were 
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set to 0.01. The maximal posterior error probability (PEP), 
which is the individual probability of each peptide to be a 
false hit considering identification score and peptide 
length, was set to 0.01. The minimal peptide score was 50. 
Two sequence-unique peptides were required to occur at 
least three times in two different replicates for high- 
confidence protein identifications. In exceptional cases 
single-sequence-unique identifications were accepted if 
the peptide occurred in at least three different replicates 
and was identified in both hypochlorite washed samples 
and water washed samples. 

Identifications with one or two sequence-unique pep- 
tides were routinely validated using the MaxQuant Ex- 
pert System software [67] considering the assignment of 
major peaks, occurrence of uninterrupted y- or b-ion 
series of at least four consecutive amino acids, preferred 
cleavages N-terminal to proline bonds, the possible pres- 
ence of a2/b2 ion pairs and immonium ions, mass accur- 
acy and score. Based on the sum of peak intensities, the 
iBAQ [63] option of MaxQuant was used to calculate 
the approximate proportion of each protein in the total 
identifiable proteome. 

Sequence similarity searches were performed with 
FASTA (http://www.ebi.ac.uk/Tools/sss/fasta/) [68] against 
current releases of the Uniprot Knowledgebase (UniProt 
KB). Other bioinformatics tools used were Clustal Omega 
for sequence alignments (http://www.ebi.ac.uk/Tools/msa/ 
clustalo/) [69], InterProScan (http://www.ebi.ac.uk/Tools/ 
pfa/iprscan/) [70] for domain predictions, and SignalP 4.1 
(http://www.cbs.dtu.dk/services/SignalP/) [71] for signal 
sequence prediction. 

Comparisons of molluscan shell forming proteomes 

BLASTp based comparisons of the C. nemoralis shell 
proteome were made against five previously published 
molluscan shell proteomes. These included 42 Pinctada 
maxima proteins reported by Marie et al. [30], 78 Pinc- 
tada margaritifera proteins reported by Marie et al. [30], 
253 Crassostrea gigas proteins reported by Zhang et al. 
[7], a combined set of 94 Haliotis asinina proteins re- 
ported by Marie et al. [6] and Jackson et al. [4], and a 
combined set of 631 Lottia gigantea proteins reported by 
Marie et al. [33] and Mann et al. [5]. The e-value threshold 
was set to le-06. These comparisons were made using a 
modified version of Circoletto [72] which uses an imple- 
mentation of the legacy BLAST package. The following 
command was issued to the circoletto.pl script: circoletto. 
pi -query XX -database XX -untangling_off -e_value 
lOe-6 -best_hit_type local -score2colour eval. The *.conf 
files generated by circoletto.pl were modified using cus- 
tom Perl scripts and then passed to Circos [73] in order to 
generate an ideogram. The Circoletto bias ted file which 
details all of the BLASTp results is provided as Additional 
file 12. 
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Additional file 1: A list of all accepted identifications after manual 
validation of all protein/protein group identifications provided 
by MaxQuant. 

Additional file 2: A complete list of protein identifications in 
matrices extracted from shells not treated with hypochlorite plus 
ultra-sonication. 

Additional file 3: A complete list of protein identifications in matrices 
extracted from shells treated with hypochlorite plus ultra-sonication 
before demineralization in acid. 

Additional file 4: The peptide data complementing protein data of 
Additional file 2. 

Additional file 5: The peptide data complementing protein data of 
Additional file 3. 

Additional file 6: A schematic representation of a C nemoralis 
putative trans-membrane protein (derived from isotig_5087), onto 
which the spatial distribution of the 72 LC-MS peptides are mapped. 

Additional file 7: Top BLASTp hits returned against C nemoralis 
queries from five molluscan shell proteomes and Swissprot. 

Additional file 8: A 50% majority rule consensus tree generated by 
Bayesian methods representing the phylogenetic relationships of 
metazoan CAs. 

Additional file 9: The Nucleotide and derived protein sequence of 
Cnem25891. 

Additional file 10: Hydropathy profiles of Cnem25891 and two 
previously reported molluscan shell-forming proteins which also 
posses CA domains. 

Additional file 11: Contains the 59 isotigs (nucleic acid sequences) 
that were recovered by MaxQuant. 

Additional file 12: The Circoletto generated "*.blasted" file used to 
generate the Circos figure (Figure 4). 
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